Goto

Collaborating Authors

 Minnetonka


Temporal vs. Spatial: Comparing DINOv3 and V-JEPA2 Feature Representations for Video Action Analysis

Kodathala, Sai Varun, Vunnam, Rakesh

arXiv.org Artificial Intelligence

This study presents a comprehensive comparative analysis of two prominent self-supervised learning architectures for video action recognition: DINOv3, which processes frames independently through spatial feature extraction, and V-JEPA2, which employs joint temporal modeling across video sequences. We evaluate both approaches on the UCF Sports dataset, examining feature quality through multiple dimensions including classification accuracy, clustering performance, intra-class consistency, and inter-class discrimination. Our analysis reveals fundamental architectural trade-offs: DINOv3 achieves superior clustering performance (Silhouette score: 0.31 vs 0.21) and demonstrates exceptional discrimination capability (6.16x separation ratio) particularly for pose-identifiable actions, while V-JEPA2 exhibits consistent reliability across all action types with significantly lower performance variance (0.094 vs 0.288). Through action-specific evaluation, we identify that DINOv3's spatial processing architecture excels at static pose recognition but shows degraded performance on motion-dependent actions, whereas V-JEPA2's temporal modeling provides balanced representation quality across diverse action categories. These findings contribute to the understanding of architectural design choices in video analysis systems and provide empirical guidance for selecting appropriate feature extraction methods based on task requirements and reliability constraints.


Fast OTSU Thresholding Using Bisection Method

Kodathala, Sai Varun

arXiv.org Artificial Intelligence

The Otsu thresholding algorithm represents a fundamental technique in image segmentation, yet its computational efficiency is severely limited by exhaustive search requirements across all possible threshold values. This work presents an optimized implementation that leverages the bisection method to exploit the unimodal characteristics of the between-class variance function. Our approach reduces the computational complexity from O(L) to O(log L) evaluations while preserving segmentation accuracy. Experimental validation on 48 standard test images demonstrates a 91.63% reduction in variance computations and 97.21% reduction in algorithmic iterations compared to conventional exhaustive search. The bisection method achieves exact threshold matches in 66.67% of test cases, with 95.83% exhibiting deviations within 5 gray levels. The algorithm maintains universal convergence within theoretical logarithmic bounds while providing deterministic performance guarantees suitable for real-time applications. This optimization addresses critical computational bottlenecks in large-scale image processing systems without compromising the theoretical foundations or segmentation quality of the original Otsu method.


Give me Some Hard Questions: Synthetic Data Generation for Clinical QA

Bai, Fan, Harrigian, Keith, Stremmel, Joel, Hassanzadeh, Hamid, Saeedi, Ardavan, Dredze, Mark

arXiv.org Artificial Intelligence

Clinical Question Answering (QA) systems enable doctors to quickly access patient information from electronic health records (EHRs). However, training these systems requires significant annotated data, which is limited due to the expertise needed and the privacy concerns associated with clinical data. This paper explores generating Clinical QA data using large language models (LLMs) in a zero-shot setting. We find that naive prompting often results in easy questions that do not reflect the complexity of clinical scenarios. To address this, we propose two prompting strategies: 1) instructing the model to generate questions that do not overlap with the input context, and 2) summarizing the input record using a predefined schema to scaffold question generation. Experiments on two Clinical QA datasets demonstrate that our method generates more challenging questions, significantly improving fine-tuning performance over baselines. We compare synthetic and gold data and find a gap between their training efficacy resulting from the quality of synthetically generated answers.


XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering

Stremmel, Joel, Saeedi, Ardavan, Hassanzadeh, Hamid, Batra, Sanjit, Hertzberg, Jeffrey, Murillo, Jaime, Halperin, Eran

arXiv.org Artificial Intelligence

Extractive question answering (QA) systems can enable physicians and researchers to query medical records, a foundational capability for designing clinical studies and understanding patient medical history. However, building these systems typically requires expert-annotated QA pairs. Large language models (LLMs), which can perform extractive QA, depend on high quality data in their prompts, specialized for the application domain. We introduce a novel approach, XAIQA, for generating synthetic QA pairs at scale from data naturally available in electronic health records. Our method uses the idea of a classification model explainer to generate questions and answers about medical concepts corresponding to medical codes. In an expert evaluation with two physicians, our method identifies $2.2\times$ more semantic matches and $3.8\times$ more clinical abbreviations than two popular approaches that use sentence transformers to create QA pairs. In an ML evaluation, adding our QA pairs improves performance of GPT-4 as an extractive QA model, including on difficult questions. In both the expert and ML evaluations, we examine trade-offs between our method and sentence transformers for QA pair generation depending on question difficulty.


Surpassing GPT-4 Medical Coding with a Two-Stage Approach

Yang, Zhichao, Batra, Sanjit Singh, Stremmel, Joel, Halperin, Eran

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) show potential for clinical applications, such as clinical decision support and trial recommendations. However, the GPT-4 LLM predicts an excessive number of ICD codes for medical coding tasks, leading to high recall but low precision. To tackle this challenge, we introduce LLM-codex, a two-stage approach to predict ICD codes that first generates evidence proposals using an LLM and then employs an LSTM-based verification stage. The LSTM learns from both the LLM's high recall and human expert's high precision, using a custom loss function. Our model is the only approach that simultaneously achieves state-of-the-art results in medical coding accuracy, accuracy on rare codes, and sentence-level evidence identification to support coding decisions without training on human-annotated evidence according to experiments on the MIMIC dataset.


Senior Manager of BI & Data Analytics at Marco Technologies LLC - Minnetonka, MN

#artificialintelligence

Find open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general, filtered by job title or popular skill, toolset and products used.


Extend and Explain: Interpreting Very Long Language Models

Stremmel, Joel, Hill, Brian L., Hertzberg, Jeffrey, Murillo, Jaime, Allotey, Llewelyn, Halperin, Eran

arXiv.org Artificial Intelligence

While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse attention LMs can represent longer sequences, overcoming performance hurdles. However, it remains unclear how to explain predictions from these models, as not all tokens attend to each other in the self-attention layers, and long sequences pose computational challenges for explainability algorithms when runtime depends on document length. These challenges are severe in the medical context where documents can be very long, and machine learning (ML) models must be auditable and trustworthy. We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction, apply MSP in the context of predicting diagnoses from medical text, and validate our approach with a blind review by two clinicians. Our method identifies about 1.7x more clinically informative text blocks than the previous state-of-the-art, runs up to 100x faster, and is tractable for generating important phrase pairs. MSP is particularly well-suited to long LMs but can be applied to any text classifier. We provide a general implementation of MSP.